Problems of Further Development of the Group Method of Data Handling Algorithms . Part I

نویسنده

  • G. A. Ivakhnenko
چکیده

The GMDH algorithms for solving interpolation problems of artificial intelligence differ from each other in the form of the reference function and the iteration rules of the multilayer model structure. In some multilayer algorithms, the number of terms in the iteration rule is constant, which leads to the skipping of some models. In the algorithm called combinatorial, the iteration rule increases by one term when passing to each next row, which ensures an exhaustive search through all of the equations. For exact and complete data, the minimum of the external criterion is nonsharp, and to determine an optimal method, extrapolation of the locus of points of the minimum of the external criterion should be performed. A comparison of linear, polynomial, and ratio-polynomial (with respect to the coefficients) functions may give a method for improving the accuracy of problem solutions. To reduce computational time, a threshold GMDH algorithm is developed which preliminarily estimates the effectiveness of the input variables at the information level and searches for model-candidates based on the most effective input variables (arguments or features). Received December 27, 1999 1. PREPROCESSING OF THE INITIAL DATA SAMPLE It is expedient to accept the following procedure for preprocessing large samples of initial data. (1) The “wild points” (the values of variables are obviously impossible) are removed and replaced by the tripled mean deviation. (2) The mean value of the variables is calculated after the wild points are removed. (3) The missing values in the sample are replaced by the mean value. (4) The quantitative variables are normalized to fit into the range between 0 and 1. (5) Each qualitative variable is assigned the value 0 or 1, depending on its class. If it is required to recognize several patterns, the sample is divided into several subsamples, and the patterns or classes are recognized pairwise. 2. ACHIEVEMENT OF THE MODEL UNIQUENESS AND ESTIMATION OF ITS REPRESENTATIVENESS In recursive search modeling methods, the initial data are represented in the sample of experimental data as a table with N rows, which are called observations, or realizations, or images (in pattern recognition). The model sought for is an equation that expresses the value of the output variable through the current and past (i.e., retarded) values of the input variables. In the GMDH algorithms, the model is sought in the form of a linear polynomial. The complexity of the model structure is determined by the number of terms in the polynomial. The optimal model (most accurate under the preset noise level) corresponds to a minimum of the external accuracy criterion. We use the term specific complexity for the ratio of the number of terms in the polynomial S describing the model to the number M + 1 of the input variables (primary and secondary features) increased by one (to take into account the presence of the free term). The graphically represented dependence of the external criterion on the specific complexity of the model answers all questions about the number of minima in the model and the sufficiency or representativeness of the sample. The number of models that can be obtained based on a given sample equals the number of sufficiently sharp minima of the dependence. A sample is considered sufficient or representative if it gives only one sharp minimum. The uniqueness of the minimum can be achieved by dividing the sample into parts according to the clusters of the optimal physical clustering. The resulting sample must contain only the rows included in the first cluster (in decreasing order of the number of points). For the second model, the points of the second cluster are used, etc. The model is considered most representative if it gives the absolute minimum of the external criterion. MATHEMATICAL THEORY OF PATTERN RECOGNITION

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pareto Optimization of Two-element Wing Models with Morphing Flap Using Computational Fluid Dynamics, Grouped Method of Data handling Artificial Neural Networks and Genetic Algorithms

A multi-objective optimization (MOO) of two-element wing models with morphing flap by using computational fluid dynamics (CFD) techniques, artificial neural networks (ANN), and non-dominated sorting genetic algorithms (NSGA II), is performed in this paper. At first, the domain is solved numerically in various two-element wing models with morphing flap using CFD techniques and lift (L) and drag ...

متن کامل

Modeling and Hybrid Pareto Optimization of Cyclone Separators Using Group Method of Data Handling (GMDH) and Particle Swarm Optimization (PSO)

In present study, a three-step multi-objective optimization algorithm of cyclone separators is catered for the design objectives. First, the pressure drop (Dp) and collection efficiency (h) in a set of cyclone separators are numerically evaluated. Secondly, two meta models based on the evolved Group Method of Data Handling (GMDH) type neural networks are regarded to model the Dp and h as the re...

متن کامل

Assessment of Lateral Displacements using Neuro-Fuzzy Group Method of Data Handling Systems

Lateral spreading is one of the most destructive effects of liquefaction. Liquefaction is known as one of the major causes of ground failure related to earthquake. This phenomenon is likely to occur when the rate of earthquake-induced excess pore water pressure buildup exceeds the rate of drainage. Estimation of the hazard of lateral spreading requires characterization of subsurface conditions....

متن کامل

A new evaluation model for selecting a qualified manager by using fuzzy Topsis ‎approach

Considering the contemporary business settings, managers’ role is more than essential to the viability and further development of an organization. Managers should possess such skills in order to effectively cope with the competition. Multiple attributes decision making (MADM) is an approach employed to solve problems involving selection from among a finite number of alternatives. The aim of thi...

متن کامل

The Prioritization of Rural Development Problems with Emphasis on Peasants Viewpoint

Thought the world, rural areas tend to represent similar characteristics. Populations are spatially distributed. Agriculture is being considered as dominant economic sector. However, this sector as well as rural peoples is encountered with some challenges regarding development. In other word, there exists some limitation as far as the resource mobilization is concerned. Sometimes spatial rural...

متن کامل

AN OPTIMIZED NEURO-FUZZY GROUP METHOD OF DATA HANDLING SYSTEM BASED ON GRAVITATIONAL SEARCH ALGORITHM FOR EVALUATION OF LATERAL GROUND DISPLACEMENTS

During an earthquake, significant damage can result due to instability of the soil in the area affected by internal seismic waves. A liquefaction-induced lateral ground displacement has been a very damaging type of ground failure during past strong earthquakes. In this study, neuro-fuzzy group method of data handling (NF-GMDH) is utilized for assessment of lateral displacement in both ground sl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000